An Analysis on National Maternal Mortality
2025-08-05
Generalized Linear Mixed Models (GLMMs) are a flexible class of statistical models that combine the features of two powerful tools: Generalized Linear Models (GLMs) and Mixed-Effects Models (Agresti 2015)
Can model non-normal outcome variables, such as binary, count, or proportion data
Incorporate random effects, which account for variation due to grouping or clustering in the data, correlated observations, and overdispersion
Handling hierarchical or grouped data (e.g., students within classrooms, patients within clinics) (Lee and Nelder 1996)
Modeling non-normal outcomes, such as:
Binary outcomes (using logistic GLMMs) (Wang et al. 2017)
Count data (using Poisson or negative binomial GLMMs) (Candy 2000)
Proportions or rates (Salinas Ruı́z et al. 2023)
Improving inference by accounting for both fixed effects (predictors of interest) and random effects (random variation across groups)
Reducing bias and inflated Type I error rates that can result from ignoring data structure (Thompson et al. 2022)
Frequently used in fields like medicine, ecology, education, and social sciences
One study explores the benefits of a zero-inflated Poisson GLMM (to handle count data has an overabundance of zeroes) applied to maternal mortality data in Ghana (Tawiah, Iddi, and Lotsi 2020)
Another study uses GLMM to investigate the effect of particulate matter on child and maternal mortality globally
Let
\(\mathbf{y}\) be a \(Nx1\) column vector outcome variable
\(\mathbf{X}\) be a \(Nxp\) matrix for the \(p\) predictor variables
\(\boldsymbol{\beta}\) be a \(px1\) column vector of the fixed effects coefficients
\(\mathbf{Z}\) is a \(Nxq\) matrix of the \(q\) random effects
\(\mathbf{u}\) is a \(qx1\) vector of random effects, and
\(\boldsymbol{\epsilon}\) is a \(Nx1\) column vector of the residuals
Then the general equation for the model is given by:
\[\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}{u}+\boldsymbol{\epsilon}\]
GLMMs typically include a link function that relates the response variable \(\mathbf{y}\) to a linear predictor, \(\eta\), which excludes the residuals. So then \[\boldsymbol{\eta}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\lambda}\]
The link function is \(g(\cdot)\), where \[g(E(\mathbf{y}))=\boldsymbol{\eta}\] where \(E(\mathbf{y})\) is the expectation of \(\mathbf{y}\). The choice of link function depends on the outcome distribution. For this paper our data demonstrates a Negative Binomial distribution for overdispered count data, so we will use a log link function.
\[g(\cdot)=log_e(\cdot)\]
\[ f(y;k,{\mu})=\frac{\Gamma(y+k)}{\Gamma(k)*(y+1)}\left(\frac{k}{\mu+k}\right)^{k}\left(1-\frac{k}{\mu+k}\right)^{y} \] The Mean of Negative Binomial is given: \(E(Y)= {\mu}\) The Variance of Negative binomial is given; \(Var(Y)= {\mu}+ \left(\frac{\mu^2}{k}\right)\), where second term determines the overdispersion, \(k\) is called the dispersion parameter and indirectly determines overdispersion. If \(k\) is significantly large relative to \({\mu^2}\) then the second term will approximate to zero and a Poisson distribution may as well be used. However, the smaller the \(k\) value the larger the overdispersion may form and then negative binomial is the correct log link to utilize.
The response variable and the predictors have a linear relationship within the levels of random effects.
The response variable is assumed to follow a negative binomial distribution, with \(\sigma^2>\mu\).
The residuals and random effects are independent.
The random effects are assumed to be normally distributed, with mean 0 and variance \(\sigma\).
Negative Binomial ideal for count data that is overdispersed (which we suspect as it is population data)
Longitudinal data is not independent so a GLMM is necessary so we can include time as a random effect
Accounts for variation in the model that would not be explained by our fixed effects
Analysis performed with R (R Core Team 2025)
Vital Statistics Rapid Release (VSRR) Provisional Maternal Death Counts and Rates, in the form of a .csv
Published by National Vital Statistics System, a collaboration between the National Center for Health Statistics (NCHS) and state vital record offices
Monthly death counts and death rates by race/ethnicity, age, and overall
Data from January 2019 to December 2024
Data is provisional and updated quarterly; becomes more reliable with more updates
Maternal Deaths between 1 and 9 are suppressed for privacy reasons
“Native Hawaiian or Other Pacific Islander, Non-Hispanic” has 70 NAs for Maternal Mortality, omitting this subgroup entirely
“American Indian or Alaska Native, Non-Hispanic” has 58 NAs for Maternal Mortality Rate, not using rate in our model, omitting will not affect modeling
| Name | Fixed_Effects | Random_Effects | Offset |
|---|---|---|---|
| all_glmmodel_nb | Ethnicity, Age_Group, Dobbs_Era | Year | log(Live_Births) |
| ethnicity_agegroup_glmmodel_nb | Ethnicity, Age_Group | Year | log(Live_Births) |
| allno_glmmodel_nb | Ethnicity, Age_Group, Dobbs_Era | Year | None |
| ethnicity_agegroupno_glmmodel_nb | Ethnicity, Age_Group | Year | None |
Family: nbinom2 ( log )
Formula:
Maternal_Deaths ~ Ethnicity + Age_Group + Dobbs_Era + (1 | Year)
Data: deaths_df3
Offset: log(Live_Births)
AIC BIC logLik -2*log(L) df.resid
4567.5 4614.2 -2272.7 4545.5 507
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
Year (Intercept) 0.03158 0.1777
Number of obs: 518, groups: Year, 6
Dispersion parameter for nbinom2 family (): 148
Conditional model:
Estimate Std. Error
(Intercept) -8.79687 0.07678
EthnicityBlack, Non-Hispanic 1.31563 0.02627
EthnicityWhite, Non-Hispanic 0.28329 0.02604
EthnicityHispanic 0.16204 0.02704
EthnicityAmerican Indian or Alaska Native, Non-Hispanic 1.65241 0.06285
EthnicityUnknown 0.04364 0.02750
Age_Group25-39 years 0.38817 0.01821
Age_Group40 years and over 1.81031 0.02053
Age_GroupUnknown NA NA
Dobbs_EraPost-Dobbs -0.22081 0.02303
z value Pr(>|z|)
(Intercept) -114.57 < 2e-16 ***
EthnicityBlack, Non-Hispanic 50.08 < 2e-16 ***
EthnicityWhite, Non-Hispanic 10.88 < 2e-16 ***
EthnicityHispanic 5.99 2.06e-09 ***
EthnicityAmerican Indian or Alaska Native, Non-Hispanic 26.29 < 2e-16 ***
EthnicityUnknown 1.59 0.113
Age_Group25-39 years 21.31 < 2e-16 ***
Age_Group40 years and over 88.17 < 2e-16 ***
Age_GroupUnknown NA NA
Dobbs_EraPost-Dobbs -9.59 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Our Chosen model in Regression equation format:
\[ \begin{align*} \log(\mathbb{E}[\text{Maternal Deaths}_i]) &= \beta_0 + \beta_1 \cdot \text{Black}_i \\ &+ \beta_2 \cdot \text{White}_i + \beta_3 \cdot \text{Hispanic}_i \\ &+ \beta_4 \cdot \text{American Indian or Alaska Native}_i \\ &+ \beta_5 \cdot \text{EthnicityUnknown}_i \\ &+ \beta_6 \cdot \text{Age 25-39}_i + \beta_7 \cdot \text{Age 40 Plus}_i \\ &+ \beta_8 \cdot \text{Post Dobbs}_i + b_{\text{Year}[i]} + \log(\text{Live Births}_i) \end{align*} \]
mean(Maternal_Deaths) var(Maternal_Deaths)
1 828.8889 30819.62
| Maternal Deaths | |||||
|---|---|---|---|---|---|
| Predictors | Log-Mean | std. Error | CI | Statistic | p |
| (Intercept) | -8.80 | 0.08 | -8.95 – -8.65 | -114.57 | <0.001 |
| Ethnicity [Black, Non-Hispanic] |
1.32 | 0.03 | 1.26 – 1.37 | 50.08 | <0.001 |
| Ethnicity [White, Non-Hispanic] |
0.28 | 0.03 | 0.23 – 0.33 | 10.88 | <0.001 |
| Ethnicity [Hispanic] | 0.16 | 0.03 | 0.11 – 0.22 | 5.99 | <0.001 |
| Ethnicity [American Indian or Alaska Native, Non-Hispanic] |
1.65 | 0.06 | 1.53 – 1.78 | 26.29 | <0.001 |
| Ethnicity [Unknown] | 0.04 | 0.03 | -0.01 – 0.10 | 1.59 | 0.113 |
| Age_Group25-39 years | 0.39 | 0.02 | 0.35 – 0.42 | 21.31 | <0.001 |
| Age Group [40 years and over] |
1.81 | 0.02 | 1.77 – 1.85 | 88.17 | <0.001 |
| Dobbs Era [Post-Dobbs] | -0.22 | 0.02 | -0.27 – -0.18 | -9.59 | <0.001 |
| Random Effects | |||||
| σ2 | 8.00 | ||||
| τ00 Year | 0.03 | ||||
| ICC | 0.00 | ||||
| N Year | 6 | ||||
| Observations | 518 | ||||
| Marginal R2 / Conditional R2 | 0.056 / 0.059 | ||||
| Maternal Deaths | |||
|---|---|---|---|
| Predictors | Incidence Rate Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.00 | <0.001 |
| Ethnicity [Black, Non-Hispanic] |
3.73 | 3.54 – 3.92 | <0.001 |
| Ethnicity [White, Non-Hispanic] |
1.33 | 1.26 – 1.40 | <0.001 |
| Ethnicity [Hispanic] | 1.18 | 1.12 – 1.24 | <0.001 |
| Ethnicity [American Indian or Alaska Native, Non-Hispanic] |
5.22 | 4.61 – 5.90 | <0.001 |
| Ethnicity [Unknown] | 1.04 | 0.99 – 1.10 | 0.113 |
| Age_Group25-39 years | 1.47 | 1.42 – 1.53 | <0.001 |
| Age Group [40 years and over] |
6.11 | 5.87 – 6.36 | <0.001 |
| Dobbs Era [Post-Dobbs] | 0.80 | 0.77 – 0.84 | <0.001 |
| Random Effects | |||
| σ2 | 8.00 | ||
| τ00 Year | 0.03 | ||
| ICC | 0.00 | ||
| N Year | 6 | ||
| Observations | 518 | ||
| Marginal R2 / Conditional R2 | 0.056 / 0.059 | ||
| Maternal Deaths | |||
|---|---|---|---|
| Predictors | Incidence Rate Ratios | CI | p |
| (Intercept) | 33.65 | 29.12 – 38.88 | <0.001 |
| Ethnicity [Black, Non-Hispanic] |
8.67 | 8.24 – 9.13 | <0.001 |
| Ethnicity [White, Non-Hispanic] |
11.09 | 10.54 – 11.67 | <0.001 |
| Ethnicity [Hispanic] | 4.81 | 4.56 – 5.07 | <0.001 |
| Ethnicity [American Indian or Alaska Native, Non-Hispanic] |
0.62 | 0.54 – 0.70 | <0.001 |
| Ethnicity [Unknown] | 3.80 | 3.60 – 4.01 | <0.001 |
| Age_Group25-39 years | 4.95 | 4.78 – 5.12 | <0.001 |
| Age Group [40 years and over] |
1.04 | 1.00 – 1.08 | 0.058 |
| Dobbs Era [Post-Dobbs] | 0.80 | 0.77 – 0.84 | <0.001 |
| Random Effects | |||
| σ2 | 0.01 | ||
| τ00 Year | 0.03 | ||
| ICC | 0.73 | ||
| N Year | 6 | ||
| Observations | 518 | ||
| Marginal R2 / Conditional R2 | 0.957 / 0.988 | ||
The data used is provisional and updates quarterly with both new and old counts, so further analysis may offer differing results
The data offered counts by age group and ethnicity, but not both (i.e. maternal deaths for black women 40 and over). The inclusion of such data would give a better indication of the relationship between the two subgroups.
Due to the Covid-19 pandemic’s impact on the healthcare system, access to regular healthcare was restricted. This likely had an impact on maternal mortality and may partially account for increased rates from 2020-2023.
More variables in the dataset would offer a better picture of the predictors of maternal mortality, specifically in regards to their relationship with ethnicity. Prenatal health, healthcare access, abortion access, and other prenatal behaviors would be useful.